Counterfactual Training

Update Meeting Dec 2024

Delft University of Technology

Arie van Deursen
Cynthia C. S. Liem

December 18, 2024

Status

  • Code base: In place and streamlined for reproducibility and configuration.
  • Experiments: Lots of work done and results largely supportive of idea.
    • Ran into problems on DelftBlue, which has set me back about 2 weeks.
  • Paper: Still bare-bones.
  • ICML: Potentially still possible to submit something, but this will be rushed and not “finished”.

Problems on Cluster

  • Trying to distribute:
    1. Models/experiments across processes.
    2. For each model/experiment distribute the counterfactual search across processes.
  • Out-of-memory issues, data races, …
  • Multi-processing for models & multi-threading for counterfactual search: low CPU efficiency on DelftBlue (jobs get cancelled).

High-Level Idea

Combine ideas from Energy-Based Models and Adversarial Training:

\[ \ell_{\text{clf}}(f(x),y) + \lambda_{\text{gen}} \ell_{\text{gen}}(x^\prime,x) + \lambda_{\text{adv}} \ell_{\text{clf}}(f(x^\prime),y) + \text{r} \]

  • \(x^\prime\) are counterfactuals of \(x\).
  • \(\ell_{\text{gen}}\) is the difference in energies between observed samples and counterfactuals.
  • Counterfactuals are recycled as adversarial examples.

Training Details

During each EPOCH:

  1. Generate nce counterfactuals and distribute across mini-batches.
  2. For each batch compute:
    • Classifier loss: \(\ell_{\text{clf}}(f(x),y)\)
    • Generator loss: \(\ell_{\text{gen}}(x^\prime,x)\)
    • Adversarial loss: \(\lambda_{\text{adv}} \ell_{\text{clf}}(f(x^\prime),y)\)
  3. Backpropagate all losses and update parameters.